Integration of genetic and genomics resources in einkorn wheat enables precision mapping of important traits

Einkorn wheat (Triticum monococcum) is an ancient grain crop and a close relative of the diploid progenitor (T. urartu) of polyploid wheat. It is the only diploid wheat species having both domesticated and wild forms and therefore provides an excellent system to identify domestication genes and genes for traits of interest to utilize in wheat improvement. Here, we leverage genomic advancements for einkorn wheat using an einkorn reference genome assembly combined with skim-sequencing of a large genetic population of 812 recombinant inbred lines (RILs) developed from a cross between a wild and a domesticated T. monococcum accession. We identify 15,919 crossover breakpoints delimited to a median and average interval of 114 Kbp and 219 Kbp, respectively. This high-resolution mapping resource enables us to perform fine-scale mapping of one qualitative (red coleoptile) and one quantitative (spikelet number per spike) trait, resulting in the identification of small physical intervals (400 Kb to 700 Kb) with a limited number of candidate genes. Furthermore, an important domestication locus for brittle rachis is also identified on chromosome 7A. This resource presents an exciting route to perform trait discovery in diploid wheat for agronomically important traits and their further deployment in einkorn as well as tetraploid pasta wheat and hexaploid bread wheat cultivars.


Supplementary Note 1 Recombination Breakpoint Detection
To determine the most accurate position of recombination breakpoints, both the macroscopic region layout--its general allelic consensus, and obvious trends of regions at a chromosome-scale--must be utilized to account for misidentified markers and thin data as well as the microscopic region--marker to marker precision--to ensure the precise location of the recombination break is identified between the two differing markers with as much resolution possible. Additionally, even at the F6-7 inbreeding generation, heterozygous regions should still constitute roughly 1-2% of the genome for any given RIL, making the ability for the algorithm to detect heterozygous sites crucial. Although there are previously described crossover detection methods (Huang et al. 2009), these lack the robustness to properly identify breakpoints in low marker density regions, identify heterozygous regions with precision, and maintain resolution in sequencing error-prone data. First, a roaming score was sequentially calculated along an entire chromosome: an allele belonging to parent 1 (L95), would be summed as +1 point. An allele belonging to the second parent, parent 2 (L96), would sum as -1 point ( figure 1).
Each variant site with a neighboring allele of the alternate parents were identified as 'bends', and a moving window was used to analyze markers to the left and right of these positions. A dual window of a combined width (or diameter) of 6,500,000 bp was used instead of a fixed number of markers to account for varying marker density. After markers had been selected inside both windows, a simple linear fit using least squares regression was performed on each side to determine the slope of the line of best fit. If either paired window had too low of marker density as defined as less than 25% of the average marker density of the chromosome or constituting only 3 markers, it was skipped and marked as low confidence. If the slope of the line was within a given tolerance of +1, -1, or 0, and the collinearity of the points were above a threshold, then that side of the window would be designated as a P1, P2, or Heterozygous region respectively. A tolerance of ± 0.2 was allowed for a homozygous window, and a tolerance of ±0.4 was allowed for a heterozygous window. This leaves the ranges -0.8 to -0.4 and 0.4 to 0.8 unassigned.
Each window was given a score based on the difference between the two slopes of each side. If the moving window pair had differing regions on either side, it was considered as a Candidate Recombination Breakpoint (CRB). If the moving window had a score below 0.1 for the difference in slope between the two lines of best fit, then that position would be marked an 'anchor' site, as its sides represent very similar regions. If two bends were more than half the distance of the sliding window away from each other, that position was also tested and would result in an anchor site between them to ensure no region was unrepresented.
These CRBs and anchor sites (with their respective information) were arranged in a list by physical position. Sites were then filtered so that only the highest scoring CRBs and lowest scoring anchor sites exist within their local neighborhood (determined by the window radius of 2.25 Mb), so that only the highest-scoring candidates of their region-to-region transition remain. Anchor sites were merged if adjacent. Finally, using the anchor sites, the continuity between each site was tested--that is, that the same type of region (P1, P2, or Heterozygous) would be matching from the right side of a CRB to the left of the next CRB continuously through the chromosome.
A recursive logic tree then filtered the CRBs down to the candidates that best represent the marker data seen. First, CRBs were removed if either of the two regions that were determined to be on either side of it could not match with any other CRB between two anchor sites. The highest scoring duplicate CRB between two anchor sites whose left and right regions matched the left and right anchor sites beside it was kept, the duplicate being removed. After filtering, if there were no surviving CRBs between two differing anchor sites (there was not a recombination breakpoint found that fits the region transition required between the two anchor sites), a secondary macroscopic search algorithm begins. This approach is useful to finding a transition point between two differing regions in a wider scale. It first identifies the width of the uncertain region. The algorithm determines if it is searching for an upward or downward bend from the region types of the anchors. For example, a region transition from P1 to P2 would mean that the region between the +1 and -1 sloped anchor sites must bend downwards at some point. From between the closest two anchor sites, a sliding window of diameter n markers, with n equal to the marker distance between the two anchor sites, would investigate every bend of the data. Each score is recorded. The window with the highest score, and with the expected bend (upwards or downwards), would be inserted as a CRB between the two anchor sites. Upon reaching continuity through every anchor site and to each end of the chromosome, the CRBs were then considered high-confidence recombination breakpoints (RBs), and the recombination breakpoints of each chromosome under each RIL were recorded.